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Abstract 

Non-stationarity of the event rate is a persistent problem in modeling time series of 
events, such as neuronal spike trains. Motivated by a variety of patterns in neurophysi- 
ological spike train recordings, we define a general class of renewal processes. This class 
is used to test the null hypothesis of stationary rate versus a wide alternative of renewal 
processes with finitely many rate changes (change points). Our test extends ideas from 
the filtered derivative approach by using multiple moving windows simultaneously. To 
adjust the rejection threshold of the test we use a Gaussian process, which emerges as 
the limit of the filtered derivative process. We analyze the benefits of our multiple filter 
test and study the increase in test power against a single window test. We also develop 
a multiple filter algorithm, which can be used when the null hypothesis is rejected in 
order to estimate the number and location of change points. In a sample data set of 
spike trains recorded from dopamine midbrain neurons in anesthetized mice in vivo, the 
detected change points agreed closely with visual inspection, and in over 70% of all spike 
trains which were classified as rate non-stationary, different change points were detected 
by different window sizes. 

1 Introduction 

In neurophysiology, spike trains are often analyzed with statistical models based on point 
e.g., renewal processes (Perkel et al. 



processes, 



1967a Johnson, 1996 Rieke et al., 1997 



Kass et al. 2005, Nawrot et al. , 2008). A large field of statistical neuroscience focuses on 



the coordination between parallel point processes (Perkel et al. , 1967b Brown et al. , 2004 



Griin and Rotter, 2010). In many models used for such analyses, rate stationarity is a crucial 



assumption, and variations of the underlying firing rate can affect the results of the applied 



techniques (e.g., Brody 1999; Griin et al. , 2003). In order to avoid such problems, several 
authors have suggested local techniques, which involve the separate treatment of sections with 



approximately stationary rate (see, e.g., Griin et al. , 2002 Staude et al. 2008 Schneider 2008 ) 



when spike trains show non-stationary properties. Therefore, it is important to capture these 
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non-stationary properties, i.e., to detect the violation of rate stationarity and to locate the 
changes in the firing rate of neurons. 

In this paper we contribute to the change point analysis of point processes. Motivated 
by the modeling of empirical data from neurophysiology, we define a general class of renewal 
processes. In this class, we test the null hypothesis of rate stationarity versus a wide alter- 
native of renewal processes with finitely many rate changes. Our test extends ideas from 



the filtered derivative approach (Steinebach and Eastwood, 1995: Bertrand, 2000) by using 



multiple moving windows simultaneously instead of just one moving window. To adjust the 
rejection threshold of the test we use a Gaussian process, which emerges as the limit of the 
filtered derivative process. We analyze the benefits of our multiple filter test and study the 
increase in test power against single window techniques. Additionally, we develop a multiple 
filter algorithm, which can be used when the null hypothesis is rejected in order to estimate 
the number and location of change points. This procedure can serve as a preprocessing step, 
splitting up the time series into sections, in which the analyses of interest can be performed 
separately. As an example, Figure [T] illustrates a point process with non-stationary rate, in 
which we aim to estimate the number and location of change points. 
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Figure 1: A time series of events for which visual inspection suggests a non-stationary rate. For 
a general class of point processes, we present a statistical test and an algorithm based on multiple 
windows in order to identify the number and location of change points in the rate. 



For identifying the number and positions of change points in time series, many techniques 
are available in mathematical statistics. For an overview see, e.g., Basseville and Nikiforov 



(1993); Brodsky and Darkhovsky (1993); Csorgo and Horvath (1997). Typically, these tech- 



niques are derived in the context of time series models with independent and identically 
distributed (i.i.d.) random variables. The classical parametric test uses a maximized likeli- 
hood quotient in order to analyze the entire process which leads to so-called pontograms in 
point/renewal process theory (see Kendall and Kendall, 1980 Csorgo and Horvath, 1987 



Steinebach and Zhang, 1993 Csorgo and Horvath, 1997). The resulting test statistics have 



extreme- value type limits, which typically show slow rates of convergence. As a second type of 
approach, moving window analyses in the context of renewal processes have been studied by 



Steinebach and Eastwood (1995). These local concepts successively investigate the life times 



of the point process instead of referring to the entire process. Their asymptotic properties 
typically yield faster rates of convergence. 

Motivated by applications, we present two extensions of existing methods: first, the high 
variability of point processes observed empirically requires a sufficiently general class of point 
process models. Accordingly, we first introduce in section [2] a new class of renewal processes 
with varying variance (RPVV), which allow a certain variability in the variance of the life 
time distributions. As a second extension to existing methods, we take into account that 
rate changes can occur on fast and slow time scales within the same time series. We propose 
a multiple filter technique that applies multiple windows simultaneously. This technique 
consists of a statistical multiple filter test (MFT) for the null hypothesis of rate stationarity 
and a multiple filter algorithm (MFA) for change point detection. 

In section [sj we first extend techniques introduced by Steinebach and Eastwood ( 1995 ) 
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to our class of RPVVs. In particular, we prove asymptotic results for a moving average 
approach called filtered derivative, which is based on comparing the number of events in 
adjacent windows. We then introduce a statistical test that is based on a set of filtered 
derivative processes, each process corresponding to one window size. The maximum over all 
processes serves as a test statistic, indicating deviations from rate stationarity if this maximum 
exceeds a threshold Q. By scaling each process, we attempt to give every window a similar 
impact on the maximum distribution. 

For practical application, we provide in section |4] a multiple filter algorithm for change 
point detection, in which the results obtained by multiple window sizes are combined. For 
each individual window, the algorithm successively searches for extreme values of the filtered 



derivative, similar to the techniques proposed by Bertrand (2000); Bertrand et al. (2011). 

In section [Sj we evaluate the MFT, discuss the significance level in finite data sets and 
compare it to bootstrap methods. Most importantly, we show by exemplary simulations that 
the MFT can have an increased test power over single window techniques even when a best 
window size is known. Thus, by using multiple window sizes, one can detect rate changes in 
fast and slow time scales simultaneously, increase the test power and avoid the problem of 
choosing one near-optimal bandwidth (cmp.. 
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Finally, we apply the MFT to a sample data set of spike train recordings obtained as 
spontaneous single-unit activity from identified dopamine neurons in the substantia nigra 
of anesthetized mice (section [6]) . In the sample data set, the detected change points agree 
closely with visual inspection. In over 70% of all spike trains, which are classified to have 
non-stationary rate, different change points are detected by different window sizes. 



2 The Point Process Model 

In this section we extend the assumptions of classical renewal processes by introducing a 



class of renewal processes with varying variance (RPVV) (section 2.1). These processes are 



assumed rate stationary, but the variance of life times may show a certain degree of variability. 



Examples of such processes are given in section 2.2 For the alternative hypothesis (section 



2.3) we combine several null elements, resulting in processes with piecewise stationary rate. 
In this model we aim to detect rate changes irrespective of other point process properties, 
such as the variability of the life times or even changes in the variability of life times. 

2.1 Renewal Processes with Varying Variance (RPVV) 

We write a point process ^ as an increasing sequence of events 

< 5i < ^2 < ^3 < • • • 

where Si denotes the occurrence time of the i-th event, for i = 1,2,. . . Alternatively, ^ is 
determined by its life times (^i)i>i, where 

^1 = Si and = Si - for i = 2, 3, . . . 

or by the counting process {Nt)t>o, where 

Nt = max{i > 1 1 5^ < t}, t>0, (1) 
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with the convention max0 := 0. 

Under the null hypothesis, we assume that a spike train can be described as an element 
<I> of the following family of rate stationary processes, which we term renewal processes with 
varying variance (RPVV). 

Definition 2.1.1. Renewal process with varying variance (RPVV) 

Let T > 0, and let ^ be a renewal process restricted on {0,T] whose life times, ^1,^2, 
are assumed to be independent, positive and square-integrable random variables with positive 
variances, such that for some /i, o", c > and all e > 0, with asymptotics as n ^ 00, we have 

Rate Stationarity: = ^, for all i G N, (2) 

1 " 

Variance Regularity: —7 Var(^j) — )• o"^, (3) 

n 

1=1 

Undeberg Cond^on: Er=i ^Ife - /^)^1{(,^.)^>.^ E?.. Va.fe)}] ^ 

Uniform variance bound: sup Var(^j) < c, (5) 

1 " 

SLLN for squared life times: - ^{Cf - ^ a.s. (6) 

1=1 

Thus, an RPVV can be a renewal process with i.i.d. life times and thus, constant variance 
of life times. This applies for example to Poisson processes or to processes with independent 
and r(p, A)-distributed life times, called here Gamma-processes. In addition, the variance of 
life times can also show a certain variability as specified in ([s]) and ([5|. Assumptions Q - 
(6) are technically sufficient for the asymptotic results that support our methods: Condition 
(3) imposes a regularity of the life times' variances over time. The Lindeberg condition Q 
is later used for process convergence to Brownian motion that allows to deduce asymptotics 
for the related counting process. There, condition ([s]) will be used additionally. Assumption 
([g]) is the strong law of large numbers (SLLN) for the squares of the life times, which will 
be needed for strong consistency of an estimation of o"^ below. Note that by Kolmogorov's 
conditions (Petrov 1995, Thm 6.8), (^f)i>i satisfying the SLLN is equivalent to 



J2pi\^f-mf]\>^)<^, (7) 
1=1 

00 ^ 

E ^^[(^' - ^E[e.'])'l{|c^EK?]|<^}] < 00, (8) 

i=l 

1 - E[ef])l||52_E[^2]|<„|] ^0, as n ^ 00. (9) 



n 

i=l 



The most important assumption Q states that in an RPVV, the mean rate is constant 
across time. We therefore also use the short notation $(/x). 

2.2 Examples of RPVVs 

Here, we give examples of point processes that satisfy the assumptions of an RPVV from 
Definition 2.1.1 We assume rate stationarity (condition Q). Figure [2] shows examples of 
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such processes. Panels A-D indicate the evolution of variances of life times, and panels E-F 
illustrate point processes with the corresponding variances and Gamma-distributed life times. 
Because Gamma-processes have been used frequently in order to describe neuronal spiking 
activity (cmp. Nawrot et al. , 2008, and the references therein), we also use Gamma processes 



for all simulations in the present article, choosing suitable combinations of rate and regularity 
parameters for each simulation. 

The most simple example of an RPVV is a process with i.i.d. life times (Figure[2]A and E). 
As a second example (Figure[2]B and F), an RPVV can be a process in which the variances of 
life times converge to a constant. Third, the variance of life times can alter regularly between 
two different values (Figure [2] C). The corresponding point process (panel G) shows regular 
and irregular sections. This example can be extended such that the mean variance of life 
times is constant at equidistant grid points g, 2g, . . . (Figure [2] D and H). 
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Figure 2: Examples of RPVVs according to Definition 2.1.1 (A-D) The variances Var(^i) of life 
times ^i, is indicated by points. Var(^i) can be constant (cr^) (A), can converge to a constant 
(B), or can be a step function alternating between different, fixed values in a regular manner (C). 
In (D), the mean variance of the g life times (^i, . . . ,^g), (^g+i, . ■ . ,■^23), etc. is a constant cr^. (E- 
H) Realizations [T = 100) of point processes with r(pi, Ai)-distributed life times with constant 
expectation E[^,;] = pi/\i ~ 1, i.e., pi — \i. The variances Yar{£_i) — Pi/Xf = are given 

in (A-D), respectively. (E) Independent and r(5, 5)-distributed life times with constant variance 
Var(^j) = cr-^ = 1/5. (F) Var(^i) = l/Xi <j'^ = 0.1. (G) The variance alternates in a regular manner, 
changing after g/2 = 20 life times between pi = Xi ~ 1 (Poisson process) and Pi = Xi — 20 (a regular 
Gamma process). (H) For g = 40, the mean variance of the g life times (^1, . . . ,£,g), {£,g+i, ■ ■ ■ ,£,2g), 
etc. equals unity. 
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2.3 The Full Model 



In contrast to the null assumption, the alternative hypothesis assumes that <I> is piecewise an 
RPVV, where the mean rate can change between the different sections. Formally, we assume 
that under the alternative hypothesis, a spike train is an element of the class constructed in 

Construction 2.3.1. Let T > 0, and let C denote the set of all finite subsets of (0,T]. 
Assume C := {ci, . . . , Cfc} G C, with ci < • • • < Cfc. 

At time start k + 1 independent RPVVs ^i{iJ,i), . . . , ^k+iifJ-k+i) with 



Let Co := 0, 



+1 •= 



Hi / Hi+i for i = 1, 
T and define 

k+l 



(10) 



i=l 



where $ 



i|(ci_i,Ci] denotes the restriction of to the interval {ci-i,Ci 



The times ci, . . . , Cfc are called change points. An example of a point process generated 
according to this construction is shown in Figure [3j The resulting rate of <I> is a step function 
with change points ci , . . . , . 
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Figure 3: The change point model combines a set of RPVVs. A realization of a process <& on (0, T] 
that results from construction |2.3.1[ $ has three change points Ci , C2 , C3 and originates from the four 
RPVVs $1, . . . , $4, jumping from process <i>i to at change point 

We now define a model set ^ := ^{T) to be the family of processes that derive from 



construction 2.3.1 and test the null hypothesis 
Hq: ^ G ^ with C = 0, i.e., $ is an RPVV, in particular rate stationary, 

against the alternative 
Ha'- ^ G ^ and C 7^ 0, i.e. there is at least one change point. 



3 The Multiple Filter Test (MFT) 

In order to test the above null hypothesis of rate stationarity in the model set we derive 



here a multiple filter test (MFT). Section 3.1 summarizes the construction of the test. Details 



on parameter estimation and limit results are given in sections 3.2 and 3.3 
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3.1 Derivation of the MFT 

The main idea of the MFT is to extend a filtered derivative technique (see the contributions 



Basseville and Nikiforov 1993: Brodsky and Darkhovsky, 1993; Csorgo and Horvath 1997), 



which shdes two adjacent windows of size h and compares the number of events in the left and 
right window. Formally, let T > and <I> be an element of the model set For h G (0, T/2] 
we define an analysis region t/j := (/i, T — h]. Let iV(a_b](^) denote the number of elements of 
$ in the interval (a, b] C (0, T]. For each point t £ Th we compare the number of events 

iV/e := N^t^h,t]{'^) and Nri := N^t,t+h]{^) 
in the left and right window (Figure |4] A) . 

1 = 700 
h = 75 



Xh 

N|e Nri 
^ ( ]( ] 



D 



Q-2.38 (a = 5%) 



-3 
-6 



;M- 1.17 





T 



Figure 4: Illustration of the computational steps and processes involved in the MFT. The MFT is 
applied here to a stationary process on (0, 700] with independent and r(0.25, 5)-distributed life times. 
(A) For one window size h = 75, the number of events in the left and right window, Nie,Nri, are 
derived for every t G r^. (B) The process {Gh,t)t£Th for one window h ~ 75. (C) The scaled process 
[Rh,t)t£Th for one window h — 75. (D) All scaled processes [R}^ t)t£Tu for h G H = {25,75,125}. 
Different gray shades indicate different window sizes, the asymptotic threshold Q is represented by 
a dashed line. Here, the test statistic M — max^^t Rh^t < Q and thus, the null hypothesis of rate 
stationarity is not rejected. 

A large difference Nri — can indicate deviations from the null hypothesis of rate 
stationarity. But because the variance of the difference depends on process parameters, the 
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difference Nri — Ni^ will be normed as follows 



GH,t := Gh,t{'^) := ^" . if s > 0, (11) 

s 

and Gh,t := if s = for all t £ Th (Figure [4] B). The term s den otes an estimator 



of ^/YailNri — Nig], which is defined in (20). We will show in section 3.3 that the pro 



cess {Gh,t)teTh converges to a 2/i-dependent Gaussian process {Lh^t)teTh- The limit process 
{Lfi^t)teTh is a continuous functional of a standard Brownian motion and depends only on T 
and h. In particular, it is independent from the parameters of ^ such as, e.g., the rate or 
regularity. 

Large absolute values of Gh,t indicate potential deviations from rate stationarity. There- 
fore, the maximum 

Mh := max \Gh,t\ 

can serve as a test statistic for a single window. 

In order to combine multiple window sizes of a finite set H C (0,T/2], we consider a set 
of stochastic processes {{Gh,t)t<^Th\h S H}, which are all derived from the same underlying 
point process Each process {Gh,t)t&Th results in one maximum Mh- Instead of using the 
raw maxima M^, we suggest to standardize M^ because the distribution of M^ depends on h. 
As mentioned above, the process {Gh,t)teTh is 2/i-dependent, and a smaller h results in weaker 
temporal dependencies of the process. This leads to higher chance fluctuations in {Gh,t)t for 
smaller h and thus, a higher rejection threshold. 

If the expectation and variance of M^ were known, we could use the term 

v'Var(MO 

in order to give every window a similar impact on the global maximum of all processes. Here, 
we approximate the expectation and variance using simulations of the set of limit processes 
{{Lh,t)tGTh\h £ H}- Defining M^ := sup^g^^ approximate the expectation E[M/j] 

by the empirical mean and the variance Yar(Mh) by the empirical variance v{M^J. The 
resulting test statistic M across all windows is defined as the global maximum 

M := max ( ^^zg ^i . (13) 

Finally, we reject the null hypothesis at level a if M > Q := Q(a,T, H). The threshold 
Q is defined such that under the null hypothesis, M > Q with probability a. In order to 
derive Q, one can again use the limit processes {{Lh^t)t£Th,\h £ H} and approximate Q by the 
empirical quantile of 

M* := sup 1— — " . (14) 
h&H \ ^Jv{Ml) J 



Note that all limit processes [L^^tji are derived from the same Brownian motion in order to 
ensure comparability with the processes {Gh,t)t, which result from the same point process <I>. 
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For change point detection explained later in section [4] and for graphical illustration, we 
use the scaled process 



Ml 



(Figure gC), 



(15) 



which scales {Gfi,t)teTh accounts for the scaling of the maxima. Because the maximum of 
all processes {Rh,t)i 



M = max max t 



(16) 



is identical to the above global test statistic, it can be read directly from the graph. The 
processes {Rh,t)teTh and their comparison with the threshold Q are illustrated in Figure [4] D. 

3.2 Variance Estimation 



By Definition 11 of our auxiliary variables Gh,t, we need to specify an estimator s for the 
variance of N^i — Ni^. The idea is to estimate the variance from the life times of the elements 
in the left and right window of Gh^f 

Let ) ^2 ) • • • be the life times of an RP VV with constant /i and o"^ as in ^ and ^ . 
Given T and h, for every t G we define 



7Ze(t,/i) := {^i : Si,Si-i G (t - h,t],i = 1,2, . . .}, 



(17) 



the set of all life times that correspond to events in the left window. We relabel this set 

of life times ^'i,^2^,--- (Figure [5|. Analogously for the right window we obtain ^ri{t,h) = 

r tri tri \ 
LSI ! S2 ) • • 



(- 



S2 



Nn = 5 



r 




en en 



t-h 



"1 
T 



t t+h 

Figure 5: Illustration of life times involved in the estimation of the variance of N^i — Ni^ at time t. 

The empirical mean of the life times in the left window is denoted by 

Aie := A/e(i, h) := -/ie{t, h), if |7/e| > 0, (18) 

and file := if \^ie\ = 0. The empirical variance of the life times is 

al := &l{t, h) := v{^ie{t, h)), if > 1, (19) 

and af^ := if < 1. The bar denotes the empirical mean, v{-) denotes the corrected 
sample variance of ^ie{t, h), and | • | denotes the number of elements. Analogously we define 
firi and (T^j for the right window. 

As an estimator for the variance of Nri — Ni^ we propose 
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and := otherwise where A denotes the minimum. Note that is zero by definition if 
the number of events is less than two in any window. We prove strong consistency of these 
estimators in an appropriate asymptotic setting at the end of the Appendix (within Step III 
there). Heuristically, this estimator is suggested by the fact that under our conditions on the 
hfe times of the RPVV we obtain for the number Nt of events up to time t that, as t — )• oo, 
we have 

A^N{0,1) and Yai[Nt]r^aH/^\ (21) 
where — ^ denotes convergence in distribution. Hence, we obtain 

(2 2 
^ + % 

3.3 Limit Distribution of {Gh,t) under Hq 

In order to compute the test statistic M and choose the rejection threshold Q we derive a limit 
of the process {Gh,t)t&Thi choosing an asymptotic setting in which time T and window size h 
grow proportionally. As the limit we identify a 2/i-dependent Gaussian process {Lh^t)t£Th on 
Th that does not depend on the parameters of the process 

To make this asymptotic statement precise, let <^ be an element of Hq with life times 
^i,^2,--- We consider two extended versions of {Gh,t)t&Th which differ by scaling. First 
{G^hj)teTh is scaled by the estimated variance, second (r|j^j)tgT-^ is scaled with the exact 
(but unknown) order of the variance: 

:= ~ ~ - , if 5(ni,n/.)>0, (22) 

and G^i^j := otherwise, and 



p(n) \^-n(t+n) ^-nzj \^'m ^'n(t~n)j , . 



for all t > h and n = 1,2,... Recall that Nf denotes the number of life times up to time t, 
and the estimator s is defined in (20). We consider the three processes {Gh,t)teTh, {G^ht)i'^Th 



and (J^^^'l)t^rh as cadlag processes in the Skorokhod topology. 

The asymptotic analysis is given by letting n — )• oo. To define the limit process, let 
W = (Wj)t>o denote a standard Brownian motion on [0, oo). For /i > we define for all i > /i 

. {Wt+h - Wt) - {Wt - Wt-h) 



The process {Lh^t)t>h is a 2/i-dependent Gaussian process, with zero mean and autocovariance 
given as 



1 - ^\v\, if \v\ £ [0,h], 

-l + ^\v\, if \v\ £ {h,2h], (25) 
^0, if \v\ > 2h, 
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for all suitable u,v (Figure [6]). Note that the autocovariance only depends on the window 
size h and the time lag v of Lh^t and Lh,t+v 

In the Appendix we show the following process convergences. These extend results of 



Steinebach and Eastwood ( 1995 ) 



the null hypothesis. Then for the processes 

{G^hl)' (rK) and {Lh,t) defined in \2^, and 



Proposition 3.3.1. Let T > and h £ (0,T/2] be a window size. Let $ be an element of 
the null hy 

(24), as n ^ CO, we have 



h,t )t&rh 



and 



fdd 



{Lh,t) 



(26) 



where — ^ denotes weak convergence in the Skorokhod topology and denotes weak conver- 
gence of all finite dimensional marginals. 



4 Multiple Filter Algorithm (MFA) for Change Point Detec- 
tion 

In section |3j the first part of the MFT was presented as a test for the null hypothesis of 
rate stationarity versus the alternative of at least one rate change. After rejection of the 
null hypothesis, we intend to identify the number and location of change points. To this 
end, we propose an algorithm that combines the results of multiple window sizes. It consists 
of a procedure for change point detection on the basis of individual windows (single filter 
algorithm - SFA, section |4.1[), a nd a multiple filter algorithm (MFA) for the combination of 
individual windows (section |4.2[ ). 

4.1 Single Filter Algorithm (SFA) 

For the detection of change points with a single window of size /i, we apply a common 
method to the scaled filtered derivative process {Rh^t)t<^ThJ which successively estimates change 



points from the maxima of the process (see the contributions Basseville and Nikiforov, 1993 



Bertrand 2000 Bertrand et al. , 20111 Antoch and Huskova, 1994). Similar procedures have 



been shown to give consistent estimates of the number and location of the change points 



under mild conditions in Gaussian sequence change point models (Huskova and Slaby, 2001). 



The SFA for one h £ H works as follows. First, observe the maximum of the process 
{Rh,t)tGTh- If niaxii?/if > Q, this indicates deviations from rate stationarity. The time ci 
at which this maximum is taken is an estimate of a change point because the maxima are 
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expected at the change points if the difference between change points is sufficiently large 
(Figure [7] A) . More precisely, one should note that the sample path of {Rh^t)teTh is a step 
function, so that the set of maximizers is an interval. We define ci as the infimum of this 
interval 

ci := inf{argmaxi?/i t}. 

Second, we observe that a change point which occurs at time c affects the behavior of the 
process {Rh^t)t&Th within the /i-neighborhood of c. 



Bh{c)■.= {c-h,c + h)r^Th (Figure [7| A), 



(27) 



while leaving all points outside of Bh[c) unaffected. Therefore, the /i-neighborhood of ci is 
omitted in the subsequent analysis. If the remaining process {Rh,t)teTh\Bh{c) outside of Bh{c) 
exceeds Q, this indicates another deviation from rate stationarity because a change point at c 
cannot cause this deviation. Therefore, we successively identify change points as the maxima 
of {Rh,t)t outside the union of all Bh{ci) of detected change points, until the process {Rh,t)t 
is smaller than Q in all remaining intervals (Figure [7] B). 
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Figure 7: The SFA and MFA. (A) A change point at time c affects the process {Rh,t)t within the 
/i-neighborhood of c, and the maximum of {Rh,t)t is expected at c. (B) The SFA successively searches 
for maxima of {Rh,t)t- When the maximum is larger than Q, the maximizer ci is the first change 
point estimator. Then the /i-neighborhood of ci is omitted, and the procedure is iterated on the 
remaining process until the maximum remains smaller than Q (underlying process different from A). 
(C) Schematic representation of the MFA for a window set H = {hi, h2, h^} (underlying process 
different from A and B). The change points detected by SFA are marked as vertical bars, and their 
/li-neighborhoods are indicated by horizontal lines {hi, black, /12, gray, /13, light gray). The MFA first 
accepts all change points detected with the smallest window hi (black diamonds). Among the change 
points detected by ft,2 (gray), only the first one is rejected because its /12-neighborhood contains an 
accepted change point. Among the change points detected by the largest window (light gray), only 
the second one is added to the list of accepted change points because its ft,3-neighborhood does not 
contain formerly accepted change points. Diamonds indicate finally accepted change points. 
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4.2 Multiple Filter Algorithm (MFA) 

We now propose a multiple filter algorithm with which the results of the SFA of different 
windows can be combined. This integrates the advantages of multiple time scales because 
large windows are more likely to detect small rate changes and small windows can be more 
sensitive to fast changes. In particular, using only a large window of size h, the SFA can fail 
or mislocate change points ci, C2 with distance smaller than h. This suggests to prefer change 
point estimates of smaller windows. 

The MFA can be summarized as follows (Figure [7] C). Let H = {hi, /12, . . . , hn} be the set 
of involved windows, with hi < . . . < hn- Derive the threshold Q for this set H as described 
in section [sj For all hi, detect change points via SFA. Let Ci := {c\, . . . ,c^.} denote the set 

of change points estimated with window hi. Then, define a set of accepted change points C, 
which is first set to C := Ci, i.e., all change points estimated by the smallest window. Among 
the change points C2 associated with /i2, only those are added to C whose /i2-neighborhood 
does not include a formerly accepted change point cj E C. The remaining estimates Cj E C2 
are assumed to be affected by change points that have already been estimated and therefore, 
omitted in the further analysis. This procedure is iterated by successively increasing the 
window sizes up to hn- 



4.3 Application to a Simulated Point Process 

Figure [8] illustrates the application of the MFA to a simulated point process with three 
change points. All change points are detected by the MFA, and the estimated change points 
correspond closely to the true change points. Consequently, the rate estimates agree closely 
with the true rates. 

Figure [8] also shows that different window sizes were used for the detection of different 
change points: While the first change point was detected by the smallest window hi = 10, 
the second was detected by /12 = 25 and the third by /13 = 150. This supports the idea 
of combining several windows: if change points are close together (e.g., ci and C2 in Figure 
[s]), small windows are preferable because large windows tend to be affected by both change 
points and thus, lead to imprecise estimates. On the other hand, small rate changes require 
large windows, which have a higher test power. Indeed, none of the individual windows could 
detect all change points (data not shown). 



4.4 Choosing the Window Set H 



The previous example and the simulations that follow in section 5.2 show that multiple filters 
can increase the probability to detect change points. This is because rate changes in fast 
and slow time scales can be detected simultaneously using multiple windows. However, using 
too many windows increases the threshold Q applied for change point detection, which can 
also decrease the test power in certain settings. Therefore, we discuss here in which way Q 
depends on the window set H and give recommendations for the choice of H. 

Because Q depends only on T, H and a, we investigate its dependency on T and H for 
a = 5%. Figure|9]A shows that if only one window is used, the single window threshold Q does 
essentially not depend on h or T. Because the test statistic is normed for every h (equation 



(12)), any window size h and any simulated time T results in a threshold of about Q ~ 1.8. 
In order to study the influence of one additional window on Q, we fix hi = 10 and illustrate 
the double window threshold Q for /12 E {11, 12, . . . , 15, 20, . . . , 150} in Figure [9] A. The 
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Figure 8: Application of the MFT to a simulated point process on (0, 700] with three change points 
at ci = 150, C2 — 180 and C3 — 500. The hfe time distributions were exp(8) in [0, ci], r(2, 26) 
in [ci,C2], exp(18) in [02,03] and r(2,33) in [03, T], corresponding to rates of 8,13,18 and 16.5 in 
the respective intervals. (A) Gray curves indicate the processes {Rh,t) for window sizes h G H = 
{10, 25, 50, 75, 100, 125, 150}. The simulated threshold Q = 2.75 is indicated by the dashed hne. The 
estimated change points Ci = 149, 62 = 182, £3 = 511 are marked by diamonds. As indicated by 
the grayscale of the diamonds, each change point was detected by a different window size. (B) Rate 
histogram of the underlying point process with real (gray) and estimated (dashed) rate profiles. 



threshold Q increases to about 2.23. Smaller /i2 close to hi = 10 lead to smaller increases 
than larger windows because the processes {Rhi,t)t and {Rh2,t)t show higher correlation if 
]/ii — /i2l is small. In Figure [9] B we successively add windows of increasing size to the set 
Hj = {10,25,50,75,100,125,150}. The increase in Q from Hi = {hi} to H2 = {/ii,/i2} is 
about the same as from H2 to Hj. Similarly, adding more windows between 10 and 150 would 
only slightly increase Q (data not shown). 

Because additional windows have minor impact onQ, we recommend the following window 
choice: The smallest window hi should be restricted such that the asymptotic significance level 
is approximately kept. To this end, section 5.1.1 investigates the empirical significance level 
for stationary Gamma processes with different regularity and rate parameters. The maximal 
window hjnax is only limited by T/2. The choice of the grid between hi and h^ax can be 
guided by the following principles: Choosing a narrow grid can detect change points in a broad 
class of time scales. However, it will also slightly increase the threshold Q and thus, reduce the 
probability to detect change points at all. Additionally, it increases the computational effort 
required for the performance of the test. Here, we study the performance for the window set 
H = {10, 25, 50, 75, 100, 125, 150}. 
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Figure 9: Dependence of Q on and T for a = 5%. (A) For SFA, the threshold Q does basically 
not depend on T or h (connected points at about Q « 1.8, color codes for T as in B). Choosing 
h\ — 10, the second window /i2 increases Q to about 2.23, with a stronger increase for larger /12 
(points at about Q = 2.2, color codes for T as in B). (B) Adding more windows from the set H — 
{10, 25, 50, 75, 100, 125, 150} leads to only slight increases in Q. Dependence on the recording time T 
is weak. 10000 simulations were used to calculate the empirical mean and standard deviation for the 
standardization of the limit process {Lh.t)t (equation (14)). 



5 Evaluation of the MFT 

5.1 Practical Applicability of the MFT 

5.1.1 Empirical Significance Level in Simulations 

As discussed in section [331 the proposed MFT is an asymptotic procedure, providing asymp- 
totic significance level a. Therefore, we use simulations in order to investigate under which 
conditions the asymptotic significance level is kept for small rates in the finite setting. We 
simulate rate stationary renewal processes with Gamma-distributed life times in order to in- 
vestigate the empirical significance level of the asymptotic MFT. We focus on the parameters 
T = 700, H = {10, 25, 50, 75, 100, 125, 150} and an asymptotic significance level a = 5%. 

Figure [T0| shows the empirical significance level obtained in 2000 simulations as a function 
of the mean (^) and standard deviation (cr) of the independent and Gamma-distributed life 
times. Under high irregularity, i.e., if a is high, the test remains conservative. With increasing 
regularity, the rate required to obtain an empirical significance level of 5% is increasing. For 
low rates and high regularity, the percentage of false positives of the MFT tends to be slightly 
larger than the asymptotic significance level. In the very extreme case of almost perfect 
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regularity and low rates (white area in the bottom right corner), the MFT should not be 
applied because the empirical significance level is largely enhanced. In all but these extreme 
parameter combinations, the detection of more than one change point was very unlikely under 
the null hypothesis (< 1%, data not shown). Thus, the detection of more than one change 
point can almost always be considered a strong indication of rate non-stationarity. 



20 10 6 Rej. Prob. 




0.05 ,, 0.10 0.15 



Figure 10: Simulated rejection probability of the MFT for processes with i.i.d. Gamma-distributed 
life times (T = 700, H = {10,25,50,75,100,125,150}, 2000 simulations). For high irregularity, the 
test tends to be conservative. With increasing regularity, the rate required to keep the asymptotic 5% 
significance level increases, /i and a denote the mean and standard deviation of life times. 

In summary, one needs to keep in mind for practical applications that the error rate can 
be slightly enhanced for regular processes with low rates. However, a false estimation of a 
non-existing change point is not problematic if one primarily intends to split up the time 
series into rate stationary sections. If the significance level needs to be kept strictly even for 
small rates, the window size needs to be increased. This has the same effect as increasing 



the rate because the approximation of {Gh,t)t to the limit process {Lh^t)t (equation (3.3.1)) 
mainly depends on the mean number of events per window. 

5.1.2 Comparison of the MFT to a Bootstrap Test 

The preceding section shows that the MFT should be treated carefully in situations with 
limited rates and high regularity because the asymptotic significance level is not precisely 
kept. Therefore, one might consider deriving Q with a bootstrap procedure, as suggested. 



e.g., by Huskova and Slaby (2001). The distribution of M can then be derived directly by 
permutation of the life times and recalculation of M in the permuted process. By construction, 
this procedure yields an empirical significance level of 5% if the underlying process is a 
classical, stationary renewal process. However, it has two shortcomings: first, it requires high 



computational effort because the process {Rh,t)t (equation (15)) needs to be recalculated for 
every realization. Second, permutation can only be applied if the life times are independent 
and identically distributed. 

Therefore, we compare the MFT with a bootstrap test when the underlying process does 
not comply with the assumption of independent and identically distributed life times, i.e., 
when the underlying process is a rate stationary RPVV but no classical renewal process. 
To this end, we simulate rate stationary processes with Gamma-distributed life times. The 
variance of life times changes every g/2 life times, alternating between two values. As shown 
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in section [Z2| (Figure [2| C) , the resulting process is an RPVV. 

In order to reduce computational effort for the bootstrap test, we replace Rh^t by only 
computing \Nri{t,h) — Nie{t,h)\, the absolute difference of the number of events in the left 
and right window, for every h and t and derive the maximum of these values as a test statistic. 
The 95%-quantile of the distribution of this test statistic is then estimated in permutations, 
and the null hypothesis is rejected if the maximum is larger than its estimated quantile. 

Table [T] shows the resulting significance levels for the MFT and the bootstrap procedure. 
The MFT roughly keeps the 5% significance level in all simulated scenarios, whereas the 
bootstrap test rejects the null hypothesis in about 3%, 7% and 15% of the simulations. This 
indicates, as expected, that permutation tests are not necessarily robust against changes in 
the variance of life times and should, therefore, not be applied under such conditions. 



r(0.5,15)'-4r(5, 150) 


MFT 


Bootstrap 


A) g = 5000 


(5.9 ± 0.7)% 


(3.0 ± 0.5)% 


B) g = 10000 


(4.7 ± 0.7)% 


(6.6 ± 0.8)% 


C) g = 20000 


(5.5 ± 0.7)% 


(15.1 ± 1.1)% 



Table 1: Comparison of the significance level of the MFT and a bootstrap test for simulated RPVVs. 
Here, the distribution of life times changes every g/2 life times from r(0.5, 15) to r(5, 150), leading to 
alternations between regular and irregular patterns. The grid size is A. g — 5000, B. g ~ 10000, C. 
g = 20000. 1000 simulations with H = {10, 25, 50, 75, 100, 125, 150} and T = 700 at level a = 5% were 
performed in all cases, 1000 permutations were used for the construction of the bootstrap threshold. 



5.2 Multiple Filters Increase the Test Power 

We have already seen in the example in section |4.3| that multiple windows can increase the 
probability to detect a change point. One explanation is that the simultaneous use of multiple 
filters avoids the problem of choosing the most appropriate single window size. But more 
importantly, the combination of multiple filters is advantageous because large windows have 
a higher detection probability, whereas small windows can be more precise or sensitive to 
fast changes. Accordingly, we show here in simulations that the MFT can even detect more 
change points than the best single window. 

In order to quantify this effect, we investigate the following random change point model. 
We simulate processes $ on (0, 700] in which the rate fiuctuates between four different values. 
The model includes rate changes of different size and in different time scales. Each process 
<I> is a piecewise composition of four independent renewal processes <I>i, . . . , $4 with Gamma- 
distributed life times with event rates fi^^ = 14, /ig ^ = 12, fi^^ = 10 and //J^ = 9. The 
change points for switches between the processes <I>i to <I>4 are given by a stationary renewal 
process <I>c on (0,T] with change points ci, . . . , C|$^|. In order to simulate change points in 
different time scales, the life times of are uniformly distributed on [0, 100]. The observed 
process $ is constructed from $1, . . . , <I>4 as follows: Set $|(oci] ^il(Oci]' i-^-' start in 
process $1. At the first change point ci choose independently and uniformly a process from 
|$2,$3,$4} and jump into this process, such that for example $|(ci C2] ~ ^'^\ici C2Y Third, 
jump back deterministically to <I>i at C2, i.e., set $1(^2 cs] ~ ^il(c2 ca]- Repeat the procedure; 
choosing uniformly a process from {^2-, ^a} at odd valued change points and returning to 
$1 at even valued change points. An example of the rate of the resulting process ^ is shown 
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in Figure 11 A. 



Figure 11 B indicates the percentage of correctly detected change points in 1000 sim- 
ulations of the described processes. A change point is called correctly detected if its h- 
neighborhood overlaps a true change point, whereas h corresponds to the window used for 
detection in the MFA. In order to identify the best individual window, the detection rate for 
the SFA is shown as a function of the window size h € {10, 11, ... , 100}. The percentage of 
correct detections is maximal at about 59% for a window size of about h = 28. Using the 
MFA with a set of multiple windows chosen here arbitrarily as H = {10, 15, . . . , 150}, the 
correct detection rate increases to about 66%. 
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Figure 11: Multiple filtering increases detection rate. (A) A random realization of the rate of a 
process $ used in the simulations. Intervals between change points are independent and unif(0, 100]- 
distributed. Simulated processes <i>i, . . . , $4 have independent and r(2, Ai)-distributed life times with 
rate parameters Ai — 28, A2 = 24, A3 = 20 and A4 = 18, leading to rates fi^^ — 14, fi2^ — 12, 
/Xg ^ = 10 and /j,4 ^ = 9 (indicated on the left). (B) Mean relative frequency of correct change point 
detections in simulated processes as a function of the chosen window size. The points represent the 
mean percentages of correct detections derived in 1000 simulations using SFA. The curve shows a 
filtered average. The window size with maximal detection probability of about 0.59 is about h « 28. 
The horizontal line marks the mean relative detection probability of about 0.66 for a set of multiple 
windows H = {10, 15, ... , 150}. 



6 Application to Spike Train Recordings 



6.1 Data Analysis 

In this section we apply the proposed MFT to a data set of 72 empirical spike train recordings 



that were reported partly in Schiemann et al. (2012). The recording time T was 540 — 
900 seconds per spike train, and the mean firing rate was about 6 spikes per second. The 
significance level was set to a = 5%. 



In order to choose the set of windows, we use the results from section 5.1.1 Figure 10 
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Briefly, a mean number of about 100 — 200 events in the smallest window is required in 
order to keep the asymptotic significance level for point processes with medium irregularity. 
Therefore, we choose a minimal window of hi = 25 for a mean rate of 6 Hz, and H = 
{25,50,75,100,125,150}. 



Figure 12 shows two spike train analyses in which multiple change points have been 
detected. As indicated by the different grayscales, different window sizes were used for change 
point estimation. From the set of 72 spike trains, 62 were identified as non-stationary. In 
50 spike trains, at least two change points were detected, and in 37 spike trains, more than 
one window was necessary for the detection of these change points. As one can see from the 
illustrations of the rate profiles in panels B and E, the estimated rate profile corresponds well 
to a rate estimate that is obtained from visual inspection. 

6.2 Practical Issues and R-Code 

In practice, the described procedures can be applied easily. Depending on a rough estimate 
of the overall rate and irregularity of the process, one needs to choose the smallest window 
such that the asymptotic properties are kept. One can then choose a set of windows up to the 
largest interesting time scale. Then, the threshold Q can be estimated by repeated simulation 



of the limit process {Lh t)t (equation (14)). 

We provide an R code (see supplementary material) that performs these steps efficiently 
within one single routine and returns an illustration comparable to Figure [T2j It also suggests 
a set of window sizes for a given time series of events. The code can be applied easily, using 
as input only a time series of events and (optional) a significance level and a set of windows, 
and returning a set of estimated change points. 



7 Discussion 

In this paper we have developed a multiple filter technique for the detection of change points 
in the event rate of time series. Motivated by the problem that rate stationarity of the 
underlying processes is crucial to many statistical analysis techniques, the multiple filter test 
(MFT) tests the null hypothesis of rate stationarity against the alternative of finitely many 
change points. In a second step, a multiple filter algorithm (MFA) identifies and locates an 
unspecified number of change points in the rate of the process. In addition, it includes a 
graphical representation in which strong deviations from rate stationarity can be visualized. 

As a first extension to existent approaches, we introduce a general class of point processes 
called renewal processes with varying variance (RPVV). In addition to standard renewal 
assumptions refiected, e.g., in Poisson or Gamma processes, an RPVV assumes that the 
variance of life times can show a certain degree of variability, which includes for example 
mixtures of Gamma processes in the simplest case. We propose RPVVs in order to account 
for the high variability of patterns observed empirically. 

In order to test the null hypothesis of rate stationarity against the alternative of finitely 
many change points, we extend a standard filtered derivative method which compares the 
number of events in adjacent windows in a moving window manner. Due to the general 
RPVV assumptions, statistical significance of deviations from rate stationarity cannot be 
tested by standard bootstrap approaches because the life times are not necessarily identically 



distributed. Therefore, we extend an asymptotic result of Steinebach and Eastwood (1995) 



to RPVVs and show that the limit {Lh,t)t of the filtered derivative process is a 2/i-dependent 
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Figure 12: Application of the MFT to two spike train recordings; T = 720, H = 
{25, 50, 75, 100, 125, 150}, a = 5%. (A) and (D) The scaled processes (i?h,t)t- Grayscales of the dif- 
ferent window sizes are indicated on the right. The dashed line marks the threshold Q, the; dctcictcd 
change points are marked by diamonds. In spike train 1, 10 change points are detected with two 
different windows; in spike train 2, 5 change points are detected with four different windows. (B) and 

(E) Rate histograms of the spike trains. Black step function indicates estimated firing rate. (C) and 

(F) Short sections of the spike trains. Arrows mark the estimated change points. 



and zero mean Gaussian process. Notably, this limit {Lh,t)t is independent of the underlying 
RPVV parameters such as the rate or the variances. Furthermore, the resulting asymptotics 
typically show faster convergence rates than the Gumbel extreme value asymptotic. By using 
the limit process, thresholds for testing the statistical significance of deviations from rate 
stationarity can be obtained by simulation. 

As a second extension to existent approaches, we combine multiple window sizes h e H 
in order to detect rate changes at fast and slow time scales simultaneously. In the present 
asymptotic setting, multiple window sizes can be combined easily because the set of processes 
{{Gh,t)t\h G H} depends on one underlying RPVV. In the same way, the set of limit processes 
{{Lh,t)t\h G H} depends on one underlying Brownian motion. In addition, the use of multiple 
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windows requires two considerations: first, the statistical properties of {Lh^t)t depend on the 
window size h. Therefore, we standardize the processes in order to give similar impact to 
every window size h. Second, change point detection requires an extended algorithm that 
combines the change points detected by multiple windows. Our multiple filter algorithm is 
based on the idea of preferring change points estimated by smaller windows to those estimated 
by larger windows. In a random change point model with multiple time scales that we used 
here for simulation, the test power of the MFT was even larger than the test power with the 
best individual window. 

The presented methods can be particularly relevant for practical applications. First, the 
general assumptions of RPVVs cover a high variability of patterns observed in empirical time 
series. Second, multiple filtering can take into account that rate changes in empirical time 
series can occur at fast and slow time scales simultaneously. In practice, one should keep in 
mind that the MFA always estimates a step function even when applied to a rate profile with 
gradual changes. Third, in order to enable an easy application of the MFA, we provide an R 
code that includes all necessary steps within one single routine. It can be computed efficiently, 
and it also includes a graphical illustration of the resulting filtered derivative processes, in 
which large values indicate deviations from rate stationarity. In an exemplary application of 
the MFA to single unit neuronal recordings, we obtained change point estimates that agreed 
closely with visual inspection. 

In summary, we believe that the present multiple filter technique can be useful for the 
estimation of change points in the event rate of time series of events. It may be used as a 
universal preprocessing step whenever statistical analysis methods are sensitive to deviations 
from rate stationarity. 
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A Appendix 



In this Appendix we prove Proposition 3.3.1 As part of the proof also the consistency of the 
estimators fiie, a? and P in (18)-(20) is shown, contained in Step III below. 



A.l Technical Preliminaries 

In this subsection we give some elementary facts for later use for which we do not claim 
originality. Note that the assumptions of all lemmas in this subsection are fulfilled for an 
RPVV as in Definition ETH 

First, we want to assure that the number of events Nt in an RPVV tends to infinity almost 
surely, while explosion is avoided. 

Lemma A. 1.1. Let {Ci}j>i be a sequence of independent, positive, integrable random vari- 
ables, interpreted as the life times of a point process on the positive line and {Nt)t>o the 
associated counting process as in (Mj. Then we have almost surely 



Nt ^ oo (t^oo). 



(28) 
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// the {£,i}i>i are square integrable and satisfy conditions ^ and ^ then for all t > 
have almost surely 



we 



Nt < oo. 



(29) 



Proof: For (28) note that Nt is increasing in t. For all fixed A: > we have 

U=l J n>l lj=l J 



n — )• OO . 



Since the are integrable we have -P(^i < oo) = 1. Continuity from above (applied twice) 
yields 

p ( n {'^" < '^}) ^ (e f. >")£„'- p (u {&> ^ }) 

Wn / \i=l / \i=l / 

k 

< hm V P > 



i=l 



This implies (28). 

For (29) first note that ^ and ([s]) imply Kolmogorov's conditions ([7])-([9]) with replaced 
by ^j. Hence we have the SLLN for (^i)i>i, i.e., (1/?^) ~^ ^ ^-s- as n — )• oo. This 

implies Y17=i C« ~^ oo a.s. and 



oo I n 



P(iV,<oo)=Pm ^e.>t =P lim 5]e.>t =1. 



□ 



\n=l K. i=l 



i=l 



Now we show that the number of events in successively increased windows, scaled with 
the widths of the windows, tends to the stationary rate almost surely. 

Lemma A. 1.2. Let {£,i}i>i be a sequence of independent, positive, square-integrable random 
variables satisfy conditions ^ and which are interpreted as the life times of a point 
process on the positive line and {Nt)t>o the associated counting process as in Then for 
all < s < t we have, as n ^ oo, almost surely 

Nnt - Nns _^ 1 

n{t — s) n 



Proof: As in the proof of Lemma A. 1.1 conditions ([2]) and ([5]) imply the SLLN for 



AAA 



i>l, 

we 



I.e., with Sn = X^iLi for n > 1, we have 5„/n — fi a.s. for n — )• oo. By Lemma 
have Nt ^ CO a.s. as t — )• oo, hence SNt/^t A* ^-s- as t — )• oo. Now, for all t > we find 
SNt l^t < SNt+ii SO that (for all t sufficiently large such that Nt > 1) 

Nt - Nt - Nt + 1 Nt ■ 
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Since the left and the right hand side tend to fi almost surely we obtain Nt/t ^ a.s. as 
t — )• oo. This implies, as n — )• oo, almost surely 

Nnt -Nns ^ t Nnt ^ ^ ^ ^_ 1 ^1 = 1 □ 

n{t — s) t — s nt t — s ns t — s fi t — s fj, fj, 



The next result will secure that the events in the different windows evolve properly in 
time. 

Lemma A. 1.3. Let {Nt)t>o be a counting process with Nq = such that for some // > and 
for all < s < t we have Nnt — Nns ~ n{t — s)/n almost surely. Further let Vi, V2, ... be a 
sequence of independent random variables that satisfies the SLLN. Then for all < s < t we 
have, as n —)• 00, almost surely 



c. 



Nnt -Nns . ^ 

Proof: Note that choosing s = in the statement of the Lemma implies Nt ~ t/fi a.s., 
such that we find A^f — >• 00 as t — )• 00. Then we calculate (for Nns > 0, the case Nns = 
being similar) 

1 y _ Nns 1 y, I ^nt — Nns 1 ST^ y 

N^t^ "'NruNr^s^ ' iV;^ Nnt " Nns . ' 

1=1 1=1 z=N„s+l 



so that, for n oo. 



1 AT ( 1 N 1 ^"'^ ^ 

- ^ns . jj^^, Nnt - Nns \ Nnt^, Nnt Nns ^ 



Nnt - Nns 



t I' S , 

c c = c, a.s. □ 



t- s \ t 



Corollary A. 1.4. Let (fi)j>i be a sequence in M with (l/ra) Vi ^ c as n ^ oo. Then 
for all < s < t, as n ^ oo, we have 

y Vi^ c. 



n{t — s) 

i= \ns\ +1 



Finally, we provide a result related to Lemma A. 1.3 for the Lindeberg condition which 
will be used below to apply the Lindeberg- Feller CLT for triangular schemes. 

Lemma A. 1.5. Let {ii\i>i be a sequence of independent, square-integrable random variables 
satisfying conditions Q), and Then, for all < s < t and all e > we have, as 
n — >• oo, 

i=Ln.J+l 

where sl{s,t) := E1=lLj+i ^'^^te)- 
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Proof: Let < s < t. Condition ([s]) and Corollary A. 1.4 imply, as n — )• oo, 
slis,t) ~ (t - s)a'^n ~ - 



i=l 



For e > set r/ := e-y/ (t — s)/{2t). It follows the existence of an element no = no(s, t) G N so 
that for all n > riQ and an appropriate null sequence o(l) we have 



ehlis, t) = (1 + o(l))£2 ^ Var(ei) > v'^Yar{Ci 



lnt\ 



LntJ 



i=l 



Thus, for n > no we obtain 

LntJ 



^"^^'^^ i=LnsJ+l 



LntJ 



^ (1 + ^Intli ... E^ - 



{(5,-M)2>£24(.,t)}J 



and the last expression tends to zero due to condition (|4|). □ 



A. 2 Proof of Proposition 3.3.1 



For r > the set of all real- valued continuous functions on [0, r] is denoted by C[0, r], the set 
of all cadlag functions by L'[0, r]. We abbreviate the metric induced by the supremum norm 
by d\\.\\, the Skorokhod metric on D[0, r] by dsK- Analogously, we define C[0,oo) and use 
the metric . y which induces the topology of compact convergence. Further, we use D[0, 00) 
and D[h,T — h] with dsK- Note that convergence in (L>[0, cxd), . y) implies convergence in 
(Z)[0,oo),dsK). 



Proposition 3.3.1 is shown in three steps: 



I: First we show that the counting process {Nt)t>o, properly normalized, converges weakly 
to a standard Brownian motion. 

II: With the continuous mapping theorem we conclude that the process (rj^"'') converges 
weakly (in the Skorokhod topology) to the process (L^.^j) defined in (24). 



Ill: We show strong consistency of s"^. This allows to switch from (r|^"]) to (G^''^"") at least 



for fdd-convergence as claimed in Proposition 3.3.1 



Step I: 

For the first step we use the following result from Billingsley (1999 Thm. 14.6. 
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Proposition A. 2.1. Let (^j)j>i be a sequence of positive random variables and {Wt)t>o be a 
standard Brownian motion. Assume the existence of positive constants /x and cr, so that the 
rescaled process X^^^ = (X^^"^)t>o defined via 



X 



[nt] 



a\/n 



(30) 



converges weakly to {Wt)t>Q in {D[0,oo),dsK)- Then, the rescaled counting process 
(Z^^"^)(>o defined as 



nt 



nt 



t > 0, 



(31) 



with Nt as in |I]) converges weakly to {Wt)t>o i"^ {D[0,oo),dsK)- 



The convergence of (X 



)t>o in (30) in (L'[0, oo), c^sk) can be shown via the stronger 



convergence in (/^[O, oo), d|| . ||). For the stronger statement we first show that (Xj^"'')(g[o_i] 
converges weakly to {Wt)t<£[o^i] in (Z)[0, 1], dy . y) applying the following Theorem 



Pollard 



(1984, Sec V, Thm 19). Then we conclude the convergence of {X 



A.2.2 



from 



)t>o 



the interval [0, 1]. 



Theorem A.2.2. Let Y,Y^^\Y^'^\ . . . be random elements of {D[0,l],d\\.\\] 
dependent life times. Suppose Y has continuous sample paths. Then, as n 
Yin) A^Y in (L>[0, l],d||.||) if and only if 



by extending 

each with in- 
> oo, we have 



1. A Fo, 



2. For all s, t with < s < t < 1 we have Y^ 



in) 



Ys 



in) 



Y-Y,. 



3. For alle > there exist a > 0, /3 > and an hq e N, such that P(|y/"'' -yi"^| < e) > /3 
for all t, s G [0, 1] with < t — s < a and all n > uq. 

We now a ssume that the sequence (^i)j>i of life times satisfies the conditions of the RPVV 



in Definition 2.1.1 and that X^"^ is constructed from {^i)i>i as in (30). The restriction of 



time to [0, 1] is denoted by := (X(")) 



We verify the conditions 1.-3. of Theorem A.2.2 for the sequence (y*-"^)n>i and the Brow- 



nian motion Y = (Wt)te[o,i]- First note that the have independent increments and Y 
has continuous sample paths. 

Condition 1. in Theorem I A. 2. 21 Clear. 

Condition 2. in Theorem I A. 2. 2 1 Note that for all n > 1 and all < s < t < 1 the increment 

in) (n) 

Y;"' - Ys' ' is the sum of elements of a triangular scheme. The n-th row of this scheme 
is of the type {{£,is„ - (Cis„+i - • • • , (6t„ - IJ-)/'^Vn}, hence it consists of 

independent random variables. 
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For the variance of the increments we have 



[nt\ 



it 



i=[?isj+l 



lnt\ 



fj^ n{t 



i= [nsj + 1 



for n — )• oo, where we use condition ( [3|) and Corohary A. 1.4 



Due to condition Q and Lemma A. 1.5, the Lindeberg condition is satisfied for the corre- 
sponding triangle scheme, so that the Lindeberg-Feher CLT imphes, as n — )• oo, 



(n) 



(32) 



Condition 3. in Theorem I A. 2. 21 Let e > 0. For all < s < t < 1 Chebyshev's inequality 
implies 



_ yH| < e) = 1 _ p(|y/") - yj")| > e) 
> 1- ^Var(y/") -y/")) 



72 (*-^^- 



cr^ n(t — s) 



[nt] 

Y Var(Ci 

i= [ns] +1 



> l-c,(t 



where we use condition ([s]), so that the constant does not depend on s,t and n. Now choose 
a > sufficiently small such that (3 > 0. 



Hence conditions 1.-3. of Theorem A. 2. 2 
verges weakly to {Wt)te[o,i] in (^[0; 1], ■ ||)- 



are satisfied, and we obtain that (y*-"'-*)n>i con- 



The convergence Y^"-^ (^t)fefo , i] ext ends to time t e [0,oo) by application of the 
following Theorem [a^ from [p"ollard| ( |l984 | Sec V, Thm 23). For r > let X('^'^) denote 
the restriction of to time [0,r], i.e., X^"'^) := {xi''^)te[o,r]- 

Theorem A. 2. 3. Let X'^^\ X''^\ X^'^\ . . . be random elements of D[0,oo), withX^^^ gC a.s., 
for some separable set C C (-D[0, oo), d\\ . y). Then the following statements are equivalent 

X(")Ax(°) m (i:>[0,oo),d||.||), (33) 
j^(n,r) ^ ^(o,r) (D[0, r] , d|| . || ) forall r>0. (34) 

This Theorem applies to the X^*^) above and choosing X^*^) = X as Brownian motion. 



Pollard 



(1984 



First note that C[0,oo) is a closed, separable subset of (£'[0, oo), d|| . y), see 
page 108). Condition ( |34[ ) has been shown for r = 1 above. For general r > it is easily 
obtained from the case r = 1 by proper re-normalization in time. 
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Hence, Theorem A. 2. 3 implies (33): 



)te[o,oo) — > {Wt)te[o,oo) oo) 



(35) 



in (D[0,oo),d||.||). 

In particular, we have the convergence (35) in (L'[0, oo), ^sk) so that the condition (30) 
is satisfied and Proposition |A.2.1| implies weak convergence of the rescaled counting process 
(Z("))(>o in (31) to a standard Brownian motion. This establishes Step I. 



Step II: 

Now we directly obtain convergence of the process F*^"^ := {T^f^j)teT-h defined in (22) as 
stated in Proposition |3.3.1| from the continuous mapping theorem and Step I: The map 
if : {D[0, oo), dsx) — ^ {D[h, T — h], (Isk) defined by 



t ^ 



{f{t + h)-f{t))-{f{t)-f{t-h)) 



2h 



te[h,T-h] 



is continuous. With the process as in Step I we have (^(Z^'^)) = F^"). Further, with 
L := {Lh^t)te[h,T-h] defined in (24) we have that L is distributed as <p{W) with a Brownian 
motion W. Hence, the continuous mapping theorem implies the first statement of (26) in 



Proposition 3.3.1 



Step III: 

We first prove the consistencies of fue and in (18) and af^ and o"^j in (19). We discuss the 



right window half. The left one follows analogously. We study asymptotics again in n while 
keeping t and h fixed. Recall the notation 

7 = lri{nt,nh) = {^j : Si,Si+i G {nt,n{t + h)], z = 1,2, . . .} 



{ii:i = Nnt + 2,..., N^u+h)] (Figure 13 ). 



Sn„,-1 Sn„, / ^N„i+2 ^N„,+4 ^Nnd+h) i^Nn(t+hl+l 

I — I 1-4 — I — I 1 — I 1 — I 1 — 1 — I — I — 



Figure 13: Life times used for the estimation of ^ and ti^ for fixed n. All life times in the interval 
{nt,n{t + h)] are used, starting with ^Ar„j+2, the first life time beyond nt, and ending with CAf„(t+fe)i 
the last life time before n{t + h). 



We find our empirical quantities from equations (|18|) and (|19|) as 

1 



flri = fj.ri{nt, nh) - 
and jlri = otherwise, and 
a^j = (T^iijit^ nh) 



- if N^it+h) - Nnt > 1, (36) 



(t+h) 



{(i-fif, if Nn^t+h) - Nnt > 2, 



(37) 
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and (T^j = otherwise. 

Note that by increasing n the window {nt, n{t + h)] is shifted to the right and the window 



size is increased (Figure 14) 





III II I II III III III llllll I III Mill 



n = 1 (- 
t 

n = 2 
general n 



t+h 



2t 



III I Hill III iiiii II I 



nt 



2(t+h) 



lllllllllllllllllllllllllllll lllllllllllll 



n(t+h) 



Figure 14: The mean and average variance of the life time distributions are estimated by all life times 
in the window (ni, nit + h)]. Increasing n shifts the window to the right and increases its size. 

Consistency of jl: The conditions ([2j) and (5| imply Kolmogorov's conditions with 

and Lemma 



A.1.2 



there replaced by ^j. Hence we have the SLLN for (^i)i>i. Lemma 
imply the strong consistency of jl: 



fiie{nt,nh) — fj., fLri{nt,nh) — ?■ /i almost surely {n — oo). 



A.1.3 



Consistency of a^: For Nn{t+h) ~ ^nt > 2 we find 



N, 



n(t + h) 



a'^..i{nt, nh) 



Nn(t+h) -Nnt-2 ^^^^^^^ 



+ 



(t+h) 



-2[i 



Nnit+h) -Nnt-2 ^^^^^^^ 



E ^0 + 



The expression in the squared brackets converges to — ^u^ almost surely due to the consistency 
of firi- We abbreviate af := Vor(^i) and center (^*)^ '■= - {(^f + Ai^), so that 



Nn(t+h) -Nnt-2 ^ N„(t+h) -Nnt-2 ^ ' 



+ 



E 



Here, the term in the squared brackets converges to o"^ + fj? almost surely, as n — )• oo, by 
condition (|3|) and Lemma A.1.3 Furthermore, condition Am now writes (l/ra) X]r=i(^r)^ ~^ 
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almost surely. Hence Lemma A. 1.3 implies 



N, 



n{t + h) 



*\2 



0, U.S., 



(38) 



as n — )• oo. Alltogether, we obtain the strong consistency of a'^: 

af^{nt, nh) — )• i{nt, nh) — )• cj^ almost surely (n — )• oo). 



Asymptotic behavior of s^: The strong consistencies of /i^e, Ari, 'S'fe and cr^j imply for defined 
in (|20|) that 



1.2/ ,N 2/10-2 ^ , . ^ 

— s (nt,nn) — )• — ^ almost surely (n — )• ooj. 
n //-^ 



(39) 



Second statement in (26): Now, we finish the proof of Proposition 3.3.1 From Step II, we 



know that F^") — )• L weakly in {D[h,T — h], (Isk)- This in particular implies weak convergence 
of all finite dimensional marginals, hence, for all ^ € N and all ti, . . . , S [h,T — h], we have, 

as n — )■ oo. 



h,ti h,te) ^ \^h,ti ^h,t 



Since we have 



^'^ s{nt,nh) 



from (39) and (40) we obtain 



Hence we have 



This finishes the proof of Proposition |3.3.1 



(40) 
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